Linear and Quadratic Discriminant Analysis - LDA & QDA
Overview
Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) are two classic classifiers which differ mainly in the type of decision surface they employ:
- LDA uses a linear decision surface.
- QDA uses a quadratic decision surface.
These classifiers are favored for several reasons:
- They offer closed-form solutions that are computationally efficient.
- They inherently support multiclass classification.
- They do not require hyperparameter tuning.
Decision Boundaries
- LDA is limited to linear boundaries.
- QDA, with its quadratic boundaries, provides more flexibility and can adapt to more complex patterns.
Dimensionality Reduction with LDA
LDA is also used for supervised dimensionality reduction by projecting input data onto a linear subspace. This subspace is defined by the directions that maximize the separation between different classes. Key points include:
- The dimensionality reduction is substantial, reducing dimensions to less than the number of classes.
- The method is most effective in a multiclass context.
- The
n_components
parameter in LDA specifies the target dimensionality but does not affect the fitting and prediction process.
Mathematical Formulation
General Formulation for LDA and QDA
Both classifiers derive from probabilistic models with the assumption that the class conditional distributions are Gaussian:
where is the number of features, is the mean of class , and is the covariance matrix for class .
Specifics for QDA
- QDA allows each class to have its own covariance matrix , leading to a more flexible classifier that can model more complex boundaries.
- If covariance matrices are diagonal, QDA simplifies to the Gaussian Naive Bayes classifier.
Specifics for LDA
- LDA assumes a shared covariance matrix across all classes, simplifying the model to:
- LDA effectively measures the Mahalanobis distance between class means and classifies based on the shortest distance.
Shrinkage and Covariance Estimation
Shrinkage is used to improve covariance matrix estimation, particularly useful when the number of features is much larger than the number of samples. Key implementations include:
- Automatic shrinkage (
shrinkage='auto'
), which uses the Ledoit and Wolf lemma to determine the optimal shrinkage. - Manual setting of the shrinkage parameter, which can vary between complete reliance on the empirical covariance matrix (
shrinkage=0
) and the diagonal matrix of variances (shrinkage=1
).
Estimation Algorithms
- SVD Solver: Default for LDA, does not compute covariance directly and is suitable when the number of features is large.
- LSQR Solver: Computes the coefficients by solving linear equations, supporting shrinkage and custom covariance estimators.
- Eigen Solver: Optimizes the ratio of between-class scatter to within-class scatter and supports shrinkage.
# Import necessary classes from scikit-learn
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
# Initialize Linear Discriminant Analysis with shrinkage
lda = LinearDiscriminantAnalysis(solver='lsqr', shrinkage='auto')
# Initialize Quadratic Discriminant Analysis
qda = QuadraticDiscriminantAnalysis()
# Assume X_train, y_train are training data and labels; X_test is test data
# Fitting the models
lda.fit(X_train, y_train)
qda.fit(X_train, y_train)
# Making predictions on test data
lda_predictions = lda.predict(X_test)
qda_predictions = qda.predict(X_test)
# Using LDA for dimensionality reduction
# Set the number of components for LDA
lda_for_reduction = LinearDiscriminantAnalysis(n_components=2)
lda_for_reduction.fit(X_train, y_train)
# Transform training data to a lower dimension
X_train_reduced = lda_for_reduction.transform(X_train)